A machine learning approach for thermodynamic modeling of the statically measured solubility of nilotinib hydrochloride monohydrate (anti-cancer drug) in supercritical CO2

Nilotinib hydrochloride monohydrate (NHM) is an anti-cancer drug whose solubility was statically determined in supercritical carbon dioxide (SC-CO2) for the first time at various temperatures (308–338 K) and pressures (120–270 bar). The mole fraction of the drug dissolved in SC-CO2 ranged from 0.1 × 10–5 to 0.59 × 10–5, corresponding to the solubility range of 0.016–0.094 g/L. Four sets of models were employed to evaluate the correlation of experimental data; (1) ten empirical and semi-empirical models with three to six adjustable parameters, such as Chrastil, Bartle, Sparks, Sodeifian, Mendez-Santiago and Teja (MST), Bian, Jouyban, Garlapati-Madras, Gordillo, and Jafari-Nejad; (2) Peng-Robinson equation of state (Van der Waals mixing rule, had an AARD% of 10.73); (3) expanded liquid theory (modified Wilson model, on average, the AARD of this model was 11.28%); and (4) machine learning (ML) algorithms (random forest, decision trees, multilayer perceptron, and deep neural network with respective R2 values of 0.9933, 0.9799, 0.9724 and 0.9701). All the models showed an acceptable agreement with the experimental data, among them, the Bian model exhibited excellent performance with an AARD% of 8.11. Finally, the vaporization (73.49 kJ/mol) and solvation (− 21.14 kJ/mol) enthalpies were also calculated for the first time.


1S. Introduction
NHM is known to treat certain types of chronic myelogenous leukemia, a form of bone marrowderived white blood cell cancer. The phrase "chronic" refers to the idea that the disease progress is slower than the acute type, while the term "myeloid" is associated with a group of malignant bone marrow cells with irregular and extensive expansion. Chronic myeloid leukemia is more common in the middle-aged and older population, with a rare incidence among children. However, cases of pediatric patients are documented as well. Moreover, NHM is a promising alternative in cases where the patient did not respond to imatinib or cannot tolerate its side effects. NHM belongs to a family of medicines known as enzyme inhibitors. It can be also employed to treat children above the age of one who do not respond to existing enzyme inhibitor medications or cannot take these drugs due to their toxicity. NHM inhibits an atypical protein from activating the proliferation of cancerous cells. This process assists in the prevention or deceleration of the malignancy expansion. Table 1S presents a summary of the EoS-based model. Consequently, a suitable EoS and a definite mixing rule (e.g., vdW) are utilized to determine the fugacity coefficient of solid solute in the fluid phase, also represented by 2

2S. EoS-based model
The estimation of 2 2 is performed using the PR-EoS in conjunction with vdW2, where n2 represents the moles of solute, n1 denotes the moles of CO2, and represents the molar volume of the mixture.

3S. ELT model
Prausnitz et al. presented the equation for activity coefficient ( 2 ), solid solubility (y2) in mole fraction, and fugacity of the pure solid solute (ƒ 2 ) in the expanded liquid phase 38,39 : The heat capacity terms in the above equation can be ignored to a reasonable degree. Combining Eqs.
(2), and (3), an equation can be derived for the solute solubility: The enthalpy of fusion, melting point temperature of the solid solute, and activity coefficient of the solid solute at infinite solution are represented by − 2 ƒ , Tm, and 2 ∞ , respectively.
This includes a combinational contribution relying on Flory's theory and the value of the Gibbs excess energy 38 .
where is the excess Gibbs energy, and 12, and 21 are adjustable parameters.
Where 12 and 21 are demonstrated at infinite dilution situation: The reduced density of the SCF is denoted by ( = ), whereas represents the critical density. The molar volume of the solid solute is represented by 2 .

3S. Semi-empirical models
SSE shows the error of squares summation, while SST indicates the total of squares summation.

4S. Random Forests (RF)
The Gini index techniques evaluate the subset of characteristics picked in each inner node. The division feature in that node is determined based on the element with the most extraordinary Gini index. The Gini index 56 was formulated by Breiman, Friedman, Olshen, and Stone to assess data purity or confidence in the occurrence of an event that would determine the classification label.
However, it should be noted that the index was originally developed in 1912 by Corrado Gini, an Italian statistician. In its most basic form, Gini index can be computed by: under examination and Ci refers to the tag assigned to each class within the data set.
The estimations are detailed in Table 2S.